Machine translation using bilingual term entries extracted from parallel texts
نویسنده
چکیده
Patent summaries are machine-translated using bilingual term entries extracted from parallel texts for evaluation. The result shows that bilingual term entries extracted from 2,000 pairs of parallel texts which share a specific domain with the input texts introduce more improvements than a technical term dictionary with 38,000 entries which covers a broader domain. The result also shows that only 10 pairs of parallel texts found by similar document retrieval have comparable effects to the technical term dictionary, suggesting that parallel texts to be used do not need to be classified into fields prior to term extraction.
منابع مشابه
Improving the Performance of an Example-Based Machine Translation System Using a Domain-specific Bilingual Lexicon
In this paper, we study the impact of using a domain-specific bilingual lexicon on the performance of an Example-Based Machine Translation system. We conducted experiments for the EnglishFrench language pair on in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents), and we compared the results of the Example-Based ...
متن کاملEnriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases
In this paper, we present a new hybridisation approach consisting of enriching the phrase table of a phrase-based statistical machine translation system with bilingual phrase pairs matching structural transfer rules and dictionary entries from a shallowtransfer rule-based machine translation system. We have tested this approach on different small parallel corpora scenarios, where pure statistic...
متن کاملAutomatic transfer rule induction from parallel corpora
Recently, many projects have been proposed aiming at automatically transforming the multilingual information available on parallel texts into linguistic knowledge useful for machine translation. This paper describes an ongoing PhD project in which the main goal is to automatically induce transfer rules and bilingual dictionaries from part-of-speech tagged and lexically aligned parallel corpora....
متن کاملAutomatic induction of translation lexicons from aligned parallel corpus
Translation lexicons are one of the most important linguistic resources for machine translation. However, this bilingual set of word and multiword correspondences requires a lot of manual work to be built. This paper describes a method to automatically build translation lexicons by extracting knowledge from PoS-tagged and lexically aligned parallel corpora. Preliminary experiments were carried ...
متن کاملIdentifying Word Correspondences in Parallel Texts
Researchers in both machine translation (e.g., Brown et a/, 1990) arm bilingual lexicography (e.g., Klavans and Tzoukermarm, 1990) have recently become interested in studying parallel texts (also known as bilingual corpora), bodies of text such as the Canadian Hansards (parliamentary debates) which are available in multiple languages (such as French and English). Much of the current excitement ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systems and Computers in Japan
دوره 36 شماره
صفحات -
تاریخ انتشار 2005